Skip to content

[GLUTEN-7600][VL] Add monotonically_increasing_id function mapping#11674

Open
n0r0shi wants to merge 1 commit intoapache:mainfrom
n0r0shi:feat/monotonically-increasing-id
Open

[GLUTEN-7600][VL] Add monotonically_increasing_id function mapping#11674
n0r0shi wants to merge 1 commit intoapache:mainfrom
n0r0shi:feat/monotonically-increasing-id

Conversation

@n0r0shi
Copy link

@n0r0shi n0r0shi commented Feb 28, 2026

Summary

  • Adds Sig[MonotonicallyIncreasingID] to ExpressionMappings.SCALAR_SIGS so the function is offloaded to Velox instead of falling back to vanilla Spark.
  • Sets Velox's expression.dedup_non_deterministic to false to match Spark semantics — Spark never deduplicates non-deterministic expressions, each call has independent state.
  • Un-ignores and fixes the test in ScalarFunctionsValidateSuite.

Context

PR #10097 previously attempted this but was closed because of a result mismatch (#7628): SELECT monotonically_increasing_id(), monotonically_increasing_id() returned

  ┌─────┬───────┬───────┐
  │ Row │ Col 1 │ Col 2 │
  ├─────┼───────┼───────┤
  │ 0   │ 0     │ 2     │
  ├─────┼───────┼───────┤
  │ 1   │ 1     │ 3     │
  └─────┴───────┴───────┘

instead of Spark's expected

  ┌─────┬───────┬───────┐
  │ Row │ Col 1 │ Col 2 │
  ├─────┼───────┼───────┤
  │ 0   │ 0     │ 0     │
  ├─────┼───────┼───────┤
  │ 1   │ 1     │ 1     │
  └─────┴───────┴───────┘

The root cause was Velox's expression compiler deduplicating the two structurally identical calls into one shared counter instance.

Velox has since added the expression.dedup_non_deterministic config (facebookincubator/velox#15008) to control this behavior. This PR sets it to false for Gluten. This only affects non-deterministic expressions — deterministic expression deduplication is unchanged.

Question for reviewers: Is setting expression.dedup_non_deterministic = false globally the right approach? An alternative would be conditionally disabling it only when stateful expressions like monotonically_increasing_id are detected in the plan, but we believe the global approach is correct since Spark semantics never deduplicate non-deterministic expressions.

Closes #7628

Related issue: #7600

Adds `Sig[MonotonicallyIncreasingID]` to `ExpressionMappings.SCALAR_SIGS`
so the function is offloaded to Velox instead of falling back to vanilla
Spark.

Also sets Velox's `expression.dedup_non_deterministic` to `false`. By
default Velox deduplicates structurally identical non-deterministic
expression trees, merging them into a single instance with shared state.
This is incorrect for Spark semantics where each non-deterministic call
has independent state — e.g. `SELECT monotonically_increasing_id(),
monotonically_increasing_id()` must return [0,0],[1,1] (two independent
counters), not [0,2],[1,3] (one shared counter).

For seeded functions like `rand(42)`, disabling dedup is safe: each
independent instance produces the same sequence from the same seed,
matching Spark's behavior either way.

Un-ignores and fixes the corresponding test in
`ScalarFunctionsValidateSuite`.

Closes apache#7628
@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Feb 28, 2026
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@n0r0shi n0r0shi marked this pull request as ready for review March 4, 2026 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VL] Result mismatch on monotonically_increasing_id

1 participant